{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "_uuid": "09e32975579ebee4f9cee5aad785d6a0e8dc08b0"
   },
   "source": [
    "<div>\n",
    "    <h1><center style=\"background-color:#97CEFA; color:red;\"> Bitcoin走势预测</center></h1>\n",
    "</div>\n",
    "\n",
    "**什么是比特币**\n",
    "\n",
    "比特币是运行时间最长，最知名的加密货币，由中本聪（Satoshi Nakamoto）于2009年首次以匿名开源形式发布。 \n",
    "\n",
    "比特币充当数字交换的去中心化媒介，交易经过验证并记录在公共分布式账本（区块链）中，而无需受信任的记录保存机构或中央中介机构。 \n",
    "\n",
    "事务块包含先前事务块的SHA-256加密哈希，因此被“链接”在一起，作为曾经发生过的所有事务的不可变记录。  \n",
    "\n",
    "![]()\n",
    "\n",
    "**我们在这个Notebook中学习：**\n",
    "\n",
    "* 用Facebook开源的**Prophet**预测比特币走势\n",
    "\n",
    "**分享人：黄佳 《零基础学机器学习》作者**\n",
    "\n",
    "![]()\n",
    "\n",
    " [书籍链接](https://item.jd.com/12763913.html)\n",
    "\n",
    "**参考资料：** 本Notebook参考了Some Aditya Mandal的代码\n",
    "\n",
    "**资料链接：** [书籍及Notebook](https://item.jd.com/12763913.html)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 第一部分 **比特币数据集的读入和整理**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1.1 导入相关的包"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "_uuid": "e9f570203bf9452c02983eeae4259e8534e2ad6a",
    "execution": {
     "iopub.execute_input": "2021-07-01T14:35:10.355818Z",
     "iopub.status.busy": "2021-07-01T14:35:10.355463Z",
     "iopub.status.idle": "2021-07-01T14:35:10.938871Z",
     "shell.execute_reply": "2021-07-01T14:35:10.937901Z",
     "shell.execute_reply.started": "2021-07-01T14:35:10.355693Z"
    }
   },
   "outputs": [],
   "source": [
    "import numpy as np # 数学扩展包\n",
    "import pandas as pd # 数据处理包\n",
    "import matplotlib.pyplot as plt # 数据可视化\n",
    "import seaborn as sns # 数据可视化\n",
    "\n",
    "import datetime, pytz # 为csv文件中的本机时戳定义转换函数，成为可以读取的时间\n",
    "def dateparse (time_in_secs):    \n",
    "    return pytz.utc.localize(datetime.datetime.fromtimestamp(float(time_in_secs)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1.2 读入比特币数据集"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "_kg_hide-input": true,
    "_uuid": "500e2038829ebe2d7cc32b60aa9a57e3d7be5bb6",
    "execution": {
     "iopub.execute_input": "2021-07-01T14:35:23.056828Z",
     "iopub.status.busy": "2021-07-01T14:35:23.056533Z",
     "iopub.status.idle": "2021-07-01T14:36:06.328393Z",
     "shell.execute_reply": "2021-07-01T14:36:06.327743Z",
     "shell.execute_reply.started": "2021-07-01T14:35:23.05678Z"
    }
   },
   "outputs": [],
   "source": [
    "# https://www.kaggle.com/datasets/tohuangjia/bitcoin-simple-set 可以从这个链接下载数据集,然后放入当前目录\n",
    "data = pd.read_csv('./bitstampUSD_1-min_data_2012-01-01_to_2021-03-31.csv', parse_dates=[0], date_parser=dateparse)\n",
    "data[-5:]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1.3 数据集清理"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-07-01T14:36:06.329954Z",
     "iopub.status.busy": "2021-07-01T14:36:06.329697Z",
     "iopub.status.idle": "2021-07-01T14:36:06.734059Z",
     "shell.execute_reply": "2021-07-01T14:36:06.733439Z",
     "shell.execute_reply.started": "2021-07-01T14:36:06.329911Z"
    }
   },
   "outputs": [],
   "source": [
    "data['Timestamp'] = data['Timestamp'].dt.tz_localize(None)\n",
    "data = data.groupby([pd.Grouper(key='Timestamp', freq='H')]).first().reset_index()\n",
    "data = data.set_index('Timestamp')\n",
    "data = data[['Weighted_Price']]\n",
    "data['Weighted_Price'].fillna(method='ffill', inplace=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-07-01T14:36:06.735975Z",
     "iopub.status.busy": "2021-07-01T14:36:06.735556Z",
     "iopub.status.idle": "2021-07-01T14:36:06.754323Z",
     "shell.execute_reply": "2021-07-01T14:36:06.748988Z",
     "shell.execute_reply.started": "2021-07-01T14:36:06.735784Z"
    }
   },
   "outputs": [],
   "source": [
    "data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1.4 显示比特币走势曲线"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-07-01T14:36:06.756888Z",
     "iopub.status.busy": "2021-07-01T14:36:06.756455Z",
     "iopub.status.idle": "2021-07-01T14:36:07.689858Z",
     "shell.execute_reply": "2021-07-01T14:36:07.689146Z",
     "shell.execute_reply.started": "2021-07-01T14:36:06.756833Z"
    }
   },
   "outputs": [],
   "source": [
    "color_pal = [\"#F8766D\", \"#D39200\", \"#93AA00\", \"#00BA38\", \"#00C19F\", \"#00B9E3\", \"#619CFF\", \"#DB72FB\"]\n",
    "fig = data.plot(style='', figsize=(15,5), color=color_pal[0], title='BTC Weighted_Price Price (USD) by Hours')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 第二部分 **通过Facebook Prophet预测价格走势**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.1 导入Prophet包"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**第一次运行需安装下面两个包"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-07-01T14:36:07.691568Z",
     "iopub.status.busy": "2021-07-01T14:36:07.691078Z",
     "iopub.status.idle": "2021-07-01T14:36:07.694942Z",
     "shell.execute_reply": "2021-07-01T14:36:07.694111Z",
     "shell.execute_reply.started": "2021-07-01T14:36:07.691515Z"
    }
   },
   "outputs": [],
   "source": [
    "# pip install pystan==2.19.1.1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-07-01T14:36:07.696793Z",
     "iopub.status.busy": "2021-07-01T14:36:07.69629Z",
     "iopub.status.idle": "2021-07-01T14:36:07.705297Z",
     "shell.execute_reply": "2021-07-01T14:36:07.70471Z",
     "shell.execute_reply.started": "2021-07-01T14:36:07.696743Z"
    }
   },
   "outputs": [],
   "source": [
    "# pip install prophet"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-07-01T14:36:07.707551Z",
     "iopub.status.busy": "2021-07-01T14:36:07.707062Z",
     "iopub.status.idle": "2021-07-01T14:36:09.043365Z",
     "shell.execute_reply": "2021-07-01T14:36:09.042676Z",
     "shell.execute_reply.started": "2021-07-01T14:36:07.707319Z"
    }
   },
   "outputs": [],
   "source": [
    "from fbprophet import Prophet"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.2 拆分数据集"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-07-01T14:36:09.044854Z",
     "iopub.status.busy": "2021-07-01T14:36:09.044587Z",
     "iopub.status.idle": "2021-07-01T14:36:09.059701Z",
     "shell.execute_reply": "2021-07-01T14:36:09.05874Z",
     "shell.execute_reply.started": "2021-07-01T14:36:09.044808Z"
    }
   },
   "outputs": [],
   "source": [
    "split_date = '25-Dec-2020'\n",
    "data_train = data.loc[data.index <= split_date].copy()\n",
    "data_test = data.loc[data.index > split_date].copy()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.3 显示数据集拆分状态"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-07-01T14:36:09.061501Z",
     "iopub.status.busy": "2021-07-01T14:36:09.061218Z",
     "iopub.status.idle": "2021-07-01T14:36:10.516917Z",
     "shell.execute_reply": "2021-07-01T14:36:10.516074Z",
     "shell.execute_reply.started": "2021-07-01T14:36:09.061455Z"
    }
   },
   "outputs": [],
   "source": [
    "fig = data_test \\\n",
    "    .rename(columns={'Weighted_Price': 'Test Set'}) \\\n",
    "    .join(data_train.rename(columns={'Weighted_Price': 'Training Set'}), how='outer') \\\n",
    "    .plot(figsize=(15,5), title='BTC Weighted_Price Price (USD) by Hours', style='')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.4 准备训练数据集"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-07-01T14:36:10.518421Z",
     "iopub.status.busy": "2021-07-01T14:36:10.518145Z",
     "iopub.status.idle": "2021-07-01T14:36:10.526396Z",
     "shell.execute_reply": "2021-07-01T14:36:10.525607Z",
     "shell.execute_reply.started": "2021-07-01T14:36:10.518357Z"
    }
   },
   "outputs": [],
   "source": [
    "data_train = data_train.reset_index().rename(columns={'Timestamp':'ds', 'Weighted_Price':'y'})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.5 建立模型"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-07-01T14:36:10.528169Z",
     "iopub.status.busy": "2021-07-01T14:36:10.527645Z",
     "iopub.status.idle": "2021-07-01T14:36:10.533631Z",
     "shell.execute_reply": "2021-07-01T14:36:10.532678Z",
     "shell.execute_reply.started": "2021-07-01T14:36:10.527902Z"
    }
   },
   "outputs": [],
   "source": [
    "model = Prophet()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.6 拟合模型"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-07-01T14:36:10.535705Z",
     "iopub.status.busy": "2021-07-01T14:36:10.535074Z",
     "iopub.status.idle": "2021-07-01T14:40:16.123972Z",
     "shell.execute_reply": "2021-07-01T14:40:16.12315Z",
     "shell.execute_reply.started": "2021-07-01T14:36:10.535417Z"
    }
   },
   "outputs": [],
   "source": [
    "model.fit(data_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.7 在测试集上预测后续走势"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-07-01T14:40:16.127278Z",
     "iopub.status.busy": "2021-07-01T14:40:16.127052Z",
     "iopub.status.idle": "2021-07-01T14:40:24.103569Z",
     "shell.execute_reply": "2021-07-01T14:40:24.102693Z",
     "shell.execute_reply.started": "2021-07-01T14:40:16.127232Z"
    }
   },
   "outputs": [],
   "source": [
    "data_test_fcst = model.predict(df=data_test.reset_index().rename(columns={'Timestamp':'ds'}))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.8 显示预测数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-07-01T14:40:24.105047Z",
     "iopub.status.busy": "2021-07-01T14:40:24.104778Z",
     "iopub.status.idle": "2021-07-01T14:40:24.774472Z",
     "shell.execute_reply": "2021-07-01T14:40:24.773833Z",
     "shell.execute_reply.started": "2021-07-01T14:40:24.105005Z"
    }
   },
   "outputs": [],
   "source": [
    "fig = model.plot(data_test_fcst)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.9 将预测数据和实际数据一起显示"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-07-01T15:05:41.839066Z",
     "iopub.status.busy": "2021-07-01T15:05:41.838781Z",
     "iopub.status.idle": "2021-07-01T15:05:42.579016Z",
     "shell.execute_reply": "2021-07-01T15:05:42.578191Z",
     "shell.execute_reply.started": "2021-07-01T15:05:41.839012Z"
    }
   },
   "outputs": [],
   "source": [
    "f, ax = plt.subplots()\n",
    "# f.set_figheight(5)\n",
    "f.set_figwidth(10)\n",
    "ax.scatter(data_test.index, data_test['Weighted_Price'], color='r')\n",
    "fig = model.plot(data_test_fcst, ax=ax)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.10 计算模型的预测误差"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "execution": {
     "iopub.execute_input": "2021-07-01T14:40:25.5125Z",
     "iopub.status.busy": "2021-07-01T14:40:25.512219Z",
     "iopub.status.idle": "2021-07-01T14:40:25.688327Z",
     "shell.execute_reply": "2021-07-01T14:40:25.687713Z",
     "shell.execute_reply.started": "2021-07-01T14:40:25.512453Z"
    }
   },
   "outputs": [],
   "source": [
    "from sklearn.metrics import mean_squared_error # 均方误差\n",
    "mean_squared_error(y_true=data_test['Weighted_Price'],\n",
    "                   y_pred=data_test_fcst['yhat'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**至此，这个项目就完成了。**\n",
    "\n",
    "不难看出，要预测未来，尤其是高波动性的金融衍生品非常困难。\n",
    "以后我们还可以探讨使用其它一些方法预测比特币，并将各种方法的结果进行比较，评价。\n",
    "\n",
    "例如：\n",
    "* 用**LSTM**预测比特币走势\n",
    "* 用**XGBoost**预测比特币走势\n",
    "* 用**ARIMA**预测比特币走势\n",
    "\n",
    "敬请期待。。。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
